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Abstract 

This paper investigates two parameters that measure the coherence of a frame: worst-case and average coherence. 
We first use worst-case and average coherence to derive near-optimal probabilistic guarantees on both sparse signal 
detection and reconstruction in the presence of noise. Next, we provide a catalog of nearly tight frames with small 
worst-case and average coherence. Later, we find a new lower bound on worst-case coherence; we compare it to 
the Welch bound and use it to interpret recently reported signal reconstruction results. Finally, we give an algorithm 
that transforms frames in a way that decreases average coherence without changing the spectral norm or worst-case 
coherence. 
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1. Introduction 

Many classical applications, such as radar and error-correcting codes, make use of over-complete spanning sys- 
tems [46]. Oftentimes, we may view an over-complete spanning system as a. frame. Take F — to be a col- 
lection of vectors in some separable Hilbert space Iri. Then F is a frame if there exist frame bounds A and B with 
< A < B < oo such that A||x|| 2 < £ /eJ |(jc, /;)| 2 < fi||x[| 2 for every xeW. When A = B, F is called a tight frame. For 
finite -dimensional unit norm frames, where I = {1, . . . ,N}, the worst-case coherence is a useful parameter: 

fx F := maxU,fj)\. (1) 

i,je{l,...,N] 

Note that orthonormal bases are tight frames with A = B = 1 and have zero worst-case coherence. In both ways, 
frames form a natural generalization of orthonormal bases. 

In this paper, we only consider finite-dimensional frames. Those not familiar with frame theory can simply view 
a finite-dimensional frame as an M x iV matrix of rank M whose columns are the frame elements. With this view, the 
tightness condition is equivalent to having the spectral norm be as small as possible; for an MxN unit norm frame F, 
this equivalently means \\F\^ = ^. 

Throughout the literature, applications require finite-dimensional frames that are nearly tight and have small worst- 
case coherence ifTTl I2T1 |3T1 [37l l46l l47l l50l l56l . Among these, a foremost application is sparse signal processing, 
where frames of small spectral norm and/or small worst-case coherence are commonly used to analyze sparse signals 
lfTTll2T1l47ll50ll56ll . In general, sparse signal processing deals with measurements of the form 

y — Fx + e, 

where F is MxN with M <iz N, x has at most K nonzero entries, and e is some sort of noise. When given measurements 
y of x, one might be asked to reconstruct the original sparse vector x, or to find the locations of its nonzero entries, or 
to simply determine whether x is nonzero — each of these is a sparse signal processing problem. In some applications, 
the signal x is sparse in the identity basis, in which case F represents the measurement process. In other applications, 
x is sparse in an orthonormal basis or an overcomplete dictionary G IfTOl . In this case, F is a composition of A, the 
frame resulting from the measurement process, and G, the sparsifying dictionary, i.e., F - AG. We do not make 
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a distinction between the two formulations in this paper, but our results are most readily interpretable in a physical 
setting for the former case. 

Recently, [5 1 introduced another notion of frame coherence called average coherence: 



N 

1 



!£{!,... ,N] 



i=i 



(2) 



Note that, in addition to having zero worst-case coherence, orthonormal bases also have zero average coherence. 
Intuitively, worst-case coherence is a measure of dissimilarity between frame elements, whereas average coherence 
measures how well the frame elements are distributed in the unit hypersphere. In sparse signal processing, there are 
a number of performance guarantees that depend only on worst-case coherence Il20l l23l l25l l47l . These guarantees at 
best allow for sparsity levels on the order of Vm. Compressed sensing has brought guarantees that depend on the 
Restricted Isometry Property, which is much more difficult to check, but the guarantees allow for sparsity levels on 
the order of ||6][T3][T4]. Recently, [5] used worst-case and average coherence to produce probabilistic guarantees 
that also allow for sparsity levels on the order of ; these guarantees require that worst-case and average coherence 
together satisfy the following property: 

Definition 1. We say anMxiV unit norm frame F satisfies the Strong Coherence Property if 

(SCP-1) n F < j^n and (SCP-2) v F < 
where p F and v F are given by ([TJ and |2]), respectively. 

The reader should know that the constant 164 is not particularly essential to the above definition; it is used in 
to simplify some analysis and make certain performance guarantees explicit, but the constant is by no means 
optimal. This in mind, the requirement (SCP-1) can be interpreted more generally as p F = O( j^). In the next 
section, we will use the Strong Coherence Property to continue the work of [5|. Where [5| provided guarantees for 
noiseless reconstruction, we will produce near-optimal guarantees for signal detection and reconstruction from noisy 
measurements of sparse signals. These guarantees are related to those in lfTTll2Tll49ll50l . and we will also elaborate 
on this relationship. 

The results given in [5 1 and Section 2, as well as the applications discussed in ifTTl ETl |3T1 [37l l46l l47l l50l [561 
demonstrate a pressing need for nearly tight frames with small worst-case and average coherence, especially in the 
area of sparse signal processing. This paper offers three additional contributions in this regard. In Section 3, we 
provide a sizable catalog of frames that exhibit small spectral norm, worst-case coherence, and average coherence. 
With all three frame parameters provably small, these frames are guaranteed to perform well in relevant applications. 
Next, performance in many applications is dictated by worst-case coherence IfTTl I2TI |3T1 l37l l46l l47l l50l 156*1 . It is 
therefore particularly important to understand which worst-case coherence values are achievable. To this end, the 
Welch bound |46| is commonly used in the literature. However, the Welch bound is only tight when the number of 
frame elements N is less than the square of the spatial dimension M (46). Another lower bound, given in 081 l54l . 
beats the Welch bound when there are more frame elements, but it is known to be loose for real frames lTT8l . Given 
this context, Section 4 gives a new lower bound on the worst-case coherence of real frames. Our bound beats both the 
Welch bound and the bound in ||38ll54l when the number of frame elements far exceeds the spatial dimension. Finally, 
since average coherence is so new, there is currently no intuition as to when (SCP-2) is satisfied. In Section 5, we use 
ideas akin to the switching equivalence of graphs to transform a frame that satisfies (SCP-1) into another frame with 
the same spectral norm and worst-case coherence that additionally satisfies (SCP-2). 

Throughout the paper, we make use of certain notations that we address here. Recall, with big-O notation, that 
f(ri) — 0(g(n)) if there exists positive C and «o such that for all n > no, f(ri) < Cg(n). Also, f(n) = Q(g(n)) if 
g(n) = 0(f{n)), and f(n) = ®(g{n)) if /(«) = 0(g{ri)) and g(n) = Oifin)). Additionally, we use F<x to denote the 
matrix whose columns are taken from the matrix F according to the index set e K. Similarly, we use x% to denote the 
column vector whose entries are taken from the column vector x according to the index set "7C. The column vector of 
the T largest entries in column vector x is denoted by xj. We also use ||x|| to denote the t 2 norm of a vector x, while 
||F||2 is the spectral norm of a matrix F. Lastly, we use a star (*) to denote the matrix adjoint, a dagger (t) to denote 
the matrix pseudoinverse, and l K to denote the K x K identity matrix. 
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2. Worst-case and average coherence: Applications to sparse signal processing 



Frames with small spectral norm, worst-case coherence, and/or average coherence have found use in recent years 
with applications involving sparse signals. Donoho et al. used the worst-case coherence in [21j to provide uniform 
bounds on the signal and support recovery performance of combinatorial and convex optimization methods and greedy 
algorithms. Later, Tropp [50] and Candes and Plan ifTTIl used both the spectral norm and worst-case coherence to pro- 
vide tighter bounds on the signal and support recovery performance of convex optimization methods for most support 
sets under the additional assumption that the sparse signals have independent nonzero entries with zero median. Re- 
cently, Bajwa et al. Q made use of the spectral norm and both coherence parameters to report tighter bounds on 
the noisy model selection and noiseless signal recovery performance of an incredibly fast greedy algorithm called 
one-step thresholding (OST) for most support sets and arbitrary nonzero entries. In this section, we discuss further 
implications of the spectral norm and worst-case and average coherence of frames in applications involving sparse 
signals. 

2.1. The Weak Restricted Isometry Property 

A common task in signal processing applications is to test whether a collection of measurements corresponds 
to mere noise 1331 . For applications involving sparse signals, one can test measurements y e C M against the null 
hypothsis Hq : y = e and alternative hypothesis H\ : y = Fx + e, where the entries of the noise vector e e C M 
are independent, identical zero-mean complex-Gaussian random variables and the signal x e is /f-sparse. The 
performance of such signal detection problems is directly proportional to the energy in Fx 1 19 27] [33). In particular, 
existing literature on the detection of sparse signals Ifl9ll27l leverages the fact that ||.Fx|| 2 « ||x|| 2 when F satisfies the 
Restricted Isometry Property (RIP) of order K. In contrast, we now show that the Strong Coherence Property also 
guarantees \\Fx\\ 2 as ||x|| 2 for most /^-sparse vectors. We start with a definition: 

Definition 2. We say anMxJV frame F satisfies the (K, 6, p)-Weak Restricted Isometry Property (Weak RIP) if for 
every /f-sparse vector y e C", a random permutation x of y's entries satisfies 

(l-6)\\xf <\\Fx\\ 2 <(l+S)\\x\\ 2 (3) 

with probability exceeding 1 - p. 

At first glance, it may seem odd that we introduce a random permutation when we might as well define Weak RIP 
in terms of a /T-sparse vector whose support is drawn randomly from all (^j possible choices. In fact, both versions 
would be equivalent in distribution, but we stress that in the present definition, the values of the nonzero entries of x 
are not random; rather, the only randomness we have is in the locations of the nonzero entries. We wish to distinguish 
our results from those in IfTTl . which explicitly require randomness in the values of the nonzero entries. We also note 
the distinction between RIP and Weak RIP — Weak RIP requires that F preserves the energy of most sparse vectors. 
Moreover, the manner in which we quantify "most" is important. For each sparse vector, F preserves the energy of 
most permutations of that vector, but for different sparse vectors, F might not preserve the energy of permutations 
with the same support. That is, unlike RIP, Weak RIP is not a statement about the singular values of submatrices of F. 
Certainly, matrices for which most submatrices are well-conditioned, such as those discussed in [49, 50 1, will satisfy 
Weak RIP, but Weak RIP does not require this. That said, the following theorem shows, in part, the significance of the 
Strong Coherence Property. 

Theorem 3. Any M X N unit norm frame F that satisfies the Strong Coherence Property also satisfies the (K, 6, 
Weak Restricted Isometry Property provided N > 1 28 and 2K log N < min{ 1( ^ 2 , M). 

Proof. Let x be as in Definition^ Note that |3]l is equivalent to \\\Fx\\ 2 - \\x\\ 2 \ < 8\\x\\ 2 . Defining ft := {n : \x„\ > 0), 
then the Cauchy-Schwarz inequality gives 

|||Fx|| 2 - ||x|| 2 | = \x* K (F* K F K - l K )x K \ < \\x K \\ \\(F* K F K - l K )x K \\ < Vtf || % || \\{F^F % - l K )x % \U, (4) 

where the last inequality uses the fact that || • || < V^ll ■ IU in C K . We now consider J5] Lemma 3], which states that 
for any e e [0, 1) and a > 1, W^F^ - \ K )x^\U < e\\x % \\ with probability exceeding 1 - 4# e - (e -^ v ^ /16 ( 2+£rI)2 ^ 

3 



Algorithm 1 One-Step Thresholding (OST) for sparse signal reconstruction (5J 



Input: An M x N unit norm frame F, a vector y = Fx + e, and a threshold A > 
Output: An estimate x e of the true sparse signal x 



x <- 

7C <- {« : \z„\ > A) 
% *- C F #) t ) ; 



{Initialize} 
{Form signal proxy) 
{Select indices via OST) 
{Reconstruct signal via least-squares) 



provided K < min{e 2 v F 2 , (1 + a)~ l N}. We claim that Q together with J5 1 Lemma 3] gua rantee | ||Fx|| 2 - ||x|| 2 | < 6\\x\\ 2 
with probability exceeding 1 - In order to establish this claim, we fix e — 10/i y2 \ogN and a - 2 log 128 - 
1. It is then easy to see that (SCP-1) gives e < 1, and also that (SCP-2) and 2K\ogN < M give K < e 2 v} 2 /9. 
Therefore, since the assumption that N > 128 together with 2K\ogN < M implies K < (1 + a)~ l N, we obtain 
e -( e -vTv f ) 2 /i6(2+ a -')V f < The result now follows from the observation that 2K log N < implies VZe < 6. □ 

This theorem shows that having small worst-case and average coherence is enough to guarantee Weak RIP. 
This contrasts with related results by Tropp B9l l50l that require F to be nearly tight. In fact, the proof of Theo- 
rem [3] does not even use the full power of the Strong Coherence Property; instead of (SCP-1), it suffices to have 
Hf ^ 1/(15 ^f\ogN), part of what [ 5 1 calls the Coherence Property. Also, if F has worst-case coherence p.? - 0(1/ VM) 
and average coherence v F = 0(1 /M), then even if F has large spectral norm, Theorem [3] states that F preserves the 
energy of most ^-sparse vectors with K = 0(Mj log AO, i.e., the sparsity regime which is linear in the number of 
measurements. 

2.2. Reconstruction of sparse signals from noisy measurements 

Another common task in signal processing applications is to reconstruct a /^-sparse signal x € from a small 
collection of linear measurements y e C M . Recently, Tropp |50| used both the worst-case coherence and spectral 
norm of frames to find bounds on the reconstruction performance of basis pursuit (BP) ifTTl for most support sets 
under the assumption that the nonzero entries of x are independent with zero median. In contrast, [5 1 used the spectral 
norm and worst-case and average coherence of frames to find bounds on the reconstruction performance of OST for 
most support sets and arbitrary nonzero entries. However, both [5 | and [50] limit themselves to recovering x in the 
absence of noise, corresponding to y = Fx, a rather ideal scenario. 

Our goal in this section is to provide guarantees for the reconstruction of sparse signals from noisy measurements 
y = Fx+e, where the entries of the noise vector e e C M are independent, identical complex-Gaussian random variables 
with mean zero and variance a 2 . In particular, and in contrast with ED . our guarantees will hold for arbitrary unit 
norm frames F without requiring the signal's sparsity level to satisfy K = 0(pp l ). The reconstruction algorithm that 
we analyze here is the OST algorithm of 0, which is described in Algorithm [T] The following theorem extends 
the analysis of [5| and shows that the OST algorithm leads to near-optimal reconstruction error for certain important 
classes of sparse signals. 

Before proceeding further, we first define some notation. We use snr := ||x|| 2 /E[||e|| 2 ] to denote the signal-to-noise 
ratio associated with the signal reconstruction problem. Also, we use T a (t) :- [n : \x n \ > 2 ^ ^2cr 2 log AO for any 
t € (0, 1) to denote the locations of all the entries of x that, roughly speaking, lie above the noise floor cr. Finally, we 
use 7^(f) := {n : \x„\ > yPf\\x\\ log AO to denote the locations of entries of x that, roughly speaking, lie above the 
self-interference floor /Jf\\x\\- 

Theorem 4 (Reconstruction of sparse signals). Take an MxN unit norm frame F which satisfies the Strong Coherence 
Property, pick t e (0, 1), and choose A = ^2cr 2 log A^ max{yyU f snr, rfz}. Further, suppose x € C N has support 
"K drawn uniformly at random from all possible K-subsets of{\,... ,N). Then provided 



~ cfH^HllogJV' 

Algorithm\l\produces 7C such that Tcr(t) i~i 7^(f) c<Kc<K and x such that 



(5) 




(6) 
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with probability exceeding 1 — ION . Finally, defining T := |7~o-(/) O T^(f)|, we further have 



\\x-x\\<c 1 ^a 2 K^g~N + c^\\x-x T \\ (7) 

in the same probability event. Here, c\ — 37e, c 2 — j~Tj2> an d C3 = 1 + rr=m are numerical constants. 

Proof. To begin, note that since \\F\\^ > ^, we have from ([5J that K < M/(2logN). It is then easy to conclude from 
UJ Theorem 5] that 'K satisfies Ta-(t) n T M (t) c *7C c *7C with probability exceeding 1 - 6N~ l . Therefore, conditioned 
on the event £1 := {Ta-(t) n T^(t) C"K c *7C), we can make use of the triangle inequality to write 

II* - i|| < \\x# - %|| + Hx^ll. (8) 

Next, we may use ([5J and the fact that F satisfies the Strong Coherence Property to conclude from [49 1 (see, e.g., 
UJ Proposition 3]) that l|F* F,^ - l K \\2 < e~ l/2 with probability exceeding 1 - 2N~ l . Hence, conditioning on S\ and 
£2 := {\\Fy;F<K ~ ^k^ 2 < e )' we nave mat (FftP — (F*,fcF jX X F'j. since F^ is a submatrix of a full column rank 
matrix F<k , Therefore, given £1 and £2, we may write 

x <x ~ (.F<x^ (.Fx + e) — x,j£ + (F<j{)^ F^yf-Xy-yfc + (F^-) e, (9) 

and so substituting |9]) into ([8) and applying the triangle inequality gives 

||* - i|| < W^F^x^W + \\(F#?e\\ + WxntfW 

< (l + IK****)" 1 Ibll^*^*ll2)ll^*ll + IK^^r'lbll^H. (10) 

Since, given £1, we have that F\F , - I„ and F* F . are submatrices of F* F„ - I„, and since the spectral norm 

'6 L > % % K % ftyx ft ft K' f 

of a matrix provides an upper bound for the spectral norms of its submatrices, we have the following given £1 and £2: 
||F* F - H2 < e~ 1/2 and ||(F* F.) 'lb ^ _ 1/2 . We can now substitute these bounds into (JTOj and make use of the 
factthatlF* e|| < |^] 1/2 H^c||«, to conclude" that 

II* " ill < tS^II^IU + (l + i^i)l|Jtx\*ll, 

given £1 and £2. At this point, define the event £3 = {||F* e|| M < 2 -y/cr 2 log AO and note from |5, Lemma 6] that 
Pr(£°) < 2(V27rlogJVJVr 1 . A union bound therefore gives^]) with probability exceeding 1 - 1(W For (|7J, note 
that iC c 7C implies \iC\ < K, and so T a (t) n T M (t) c •ftT implies that ||%^|| < 11*^(7^(0^(0)11 = H x ~ x rll- D 

A few remarks are in order now for Theorem|4] First, if F satisfies the Strong Coherence Property and F is nearly 
tight, then OST handles sparsity that is almost linear in M: K = 0(M/ log N) from Q. Second, we do not impose any 
control over the size of T, but rather we state the result in generality in terms of T; its size is determined by the signal 
class x belongs to, the worst-case coherence of the frame F we use to measure x, and the magnitude of the noise that 
perturbs Fx. Third, the €2 error associated with the OST algorithm is the near-optimal (modulo the log factor) error of 
^cr 2 K\ogN plus the best T-term approximation error caused by the inability of the OST algorithm to recover signal 
entries that are smaller than 0(/j.f \\x\\ ^j2\ogN). In particular, if the /f-sparse signal x, the worst-case coherence ixp , 
and the noise e together satisfy ||jc — XtW = 0( ^o~ 2 K log AO, then the OST algorithm succeeds with a near-optimal l 2 
error of \\x - x|| = 0( ^o~ 2 KlogN). To see why this error is near-optimal, note that a /f-dimension vector of random 
entries with mean zero and variance cr 2 has expected squared norm cr 2 K; in our case, we pay an additional log factor 
to find the locations of the K nonzero entries among the entire Af -dimensional signal. It is important to recognize that 
the optimality condition |br— *HI = 0( tJct 2 K log N) depends on the signal class, the noise variance, and the worst-case 
coherence of the frame; in particular, the condition is satisfied whenever ||*9f\7- ( t )|| = 0( ^o~ 2 K\ogN), since 

II* - *rll < llJCftyr^oll + \\xK\T„(t)\\ = 0[ ^cr 2 K\ogN) + ||*9C\r„(oll- 

The following lemma provides classes of sparse signals that satisfy ll^yr,©!! = 0( V 'o~ 2 K log N) given sufficiently 
small noise variance and worst-case coherence, and consequently the OST algorithm is near-optimal for the recon- 
struction of such signal classes. 
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Lemma 5. Take an M xN unit norm frame F with worst-case coherence juf < ^= for some Co > 0, and suppose that 
K < ^nfip^p N f or some c\ > 0. Fix a constant fl e (0, 1], and suppose the magnitudes of fJK nonzero entries of x are 
some a — cr 2 \ogN), while the magnitudes of the remaining (1 — f3)K nonzero entries are not necessarily same, 
but are smaller than a and scale as 0{ -ycr 2 log N). Then Wx^t^dW = 0( yjcr 2 KlogN), provided cq < ^h^- 

Proof. Let 7C be the support of x, and define I := \n : \x„\ = a}. We wish to show that I C TJf), since this implies 
ll*9e\r ^ lUftvll = 0{ ^cr 2 KlogN). In order to prove I C 7~^(f), notice that 

irf = Ik/ll 2 + Wx-kvW 2 <PKa 2 + (1 -p)Ka 2 = Ka\ 
and so combining this with the fact that \\F\\ 2 > -jg gives 

M ^ < ^ V^a < ^ V^P^v « < aor. 

Therefore, provided co < we have that I cT M (t). □ 

In words, Lemma[5]implies that OST is near-optimal for those /^-sparse signals whose entries above the noise floor 
have roughly the same magnitude. This subsumes a very important class of signals that appears in applications such as 
multi-label prediction [32], in which all the nonzero entries take values ±a. To the best of our knowledge, Theorem|4] 
is the first result in the sparse signal processing literature that does not require RIP and still provides near-optimal 
reconstruction guarantees for such signals from noisy measurements, while using either random or deterministic 
frames, even when K = 0(M/ log N). 

We note that our techniques can be extended to reconstruct noisy signals, that is, we may consider measurements 
of the form y = F(x + n) + e, where n e is also a noise vector of independent, identical zero-mean complex- 
Gaussian random variables. In particular, if the frame F is tight, then our measurements will not color the noise, and 
so noise in the signal may be viewed as noise in the measurements: y = Fx + (Fn + e); if the frame is not tight, then 
the noise will become correlated in the measurements, and performance would be depend nontrivially on the frame's 
Gram matrix. Also, the authors have had some success with generalizing Theorem[4]to approximately sparse signals; 
the analysis follows similiar lines, but is rather cumbersome, and it appears as though the end result is only strong 
enough in the case of very nearly sparse signals. As such, we omit this result. 



3. Frame constructions 

In this section, we consider a range of nearly tight frames with small worst-case and average coherence. We 
investigate various ways of selecting frames at random from different libraries, and we show that for each of these 
frames, the spectral norm, worst-case coherence, and average coherence are all small with high probability. Later, 
we will consider deterministic constructions that use Gabor and chirp systems, spherical designs, equiangular tight 
frames, and error-correcting codes. For the reader's convenience, all of these constructions are summarized in Table[T] 
Before we go any further, recall the following lower bound on worst-case coherence: 



Theorem 6 (Welch bound B46I ). Every M x N unit norm frame F has worst-case coherence /if > yl 



N-M 
M(N-\y 



We will use the Welch bound in the proof of the following lemma, which gives three different sufficient conditions 
for a frame to satisfy (SCP-2). These conditions will prove quite useful in this section and throughout the paper. 

Lemma 7. For any M X N unit norm frame F, each of the following conditions implies vp < -^=: 

(i) <fk, Z* j fn) = | M every k=l,...,N, 

(ii) N > IMand^Jn = 0, 

(Hi) N > M 2 + 3M + 3 and || Z^ =1 f„\\ 2 < N. 
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Proof. For condition (i), we have 



v F = 



JV-l 



The Welch bound therefore gives Vf = 



jv-i 



N 


t 


j-> 






. l 

JV-1 






N 




max 

i 











N-l 



N-M 
M(N-l) 



N-l 



N-l^M 



< Vf 



max 



N-M 
M(N-l) 



For condition (ii), we have 



jv-i ■ 



Considering the Welch bound, it suffices to show 



1 



< 



1 



N-l VM "V M ( N - X ) 



4-, 



N-M 



Rearranging equivalently gives 



N - (M + 1)N - M(M - 1) > 0. 
When ;V = 2M, the left-hand side of (jTTJ becomes (M - l) 2 , which is trivially nonnegative. Otherwise, we have 

N >2M +\>M +\ 



(11) 



y/M(M- 



l)>^±i + 



^i) 2 +M(M- 



1). 



In this case, by the quadratic formula and the fact that the left-hand side of ( fTTj ) is concave up in N, we have that (jTTJ 
is indeed satisfied. For condition (iii), we use the triangle and Cauchy-Schwarz inequalities to get 



N-l 



max 



7=] 



Considering the Welch bound, it suffices to show 



Vn+i 



polynomial: x 4 - (M 1 + M + l)x 2 - 2M 2 x - M(M - 1) > 



^(max +1 

7=1 

J 



JV-l ■ 



N-M 



( . 1/lV ! . Taking ;t := VjV and rearranging gives a 
By convexity and monotonicity of the polynomial in 
[M+ 1 , oo), it can be shown that the largest real root of this polynomial is always smaller than M + |. Also, considering 



it is concave up in x, it suffices that VtV = x > M + |, which we have since N > M 2 + 3M + 3 > (M + |) 2 . 



□ 



3.1. Normalized Gaussian frames 

Construct a matrix with independent, Gaussian-distributed entries that have zero mean and unit variance. By 
normalizing the columns, we get a matrix called a normalized Gaussian frame. This is perhaps the most widely 
studied type of frame in the signal processing and statistics literature. To be clear, the term "normalized" is intended 
to distinguish the results presented here from results reported in earlier works, such as (5]|6][l3 52 1, which only ensure 
that Gaussian frame elements have unit norm in expectation. In other words, normalized Gaussian frame elements are 
independently and uniformly distributed on the unit hypersphere in M M . That said, the following theorem characterizes 
the spectral norm and the worst-case and average coherence of normalized Gaussian frames. 

Theorem 8 (Geometry of normalized Gaussian frames). Build a real MxN frame G by drawing entries independently 
at random from a Gaussian distribution of zero mean and unit variance. Next, construct a normalized Gaussian 
frame F by taking f n :— for every n — 1, . . . ,N. Provided 60 log N < M < 4 ^ a J N , then the following inequalities 
simultaneously hold with probability exceeding 1-1 liV -1 : 

Vl51og/V 



(i) Vf < 
(H) v F < 



\[M-y/l21ogN' 
Vl51og/V 



(Hi) \\F\\ 2 < 



M-y/llMlogN' 

a/m+VJV+ y/2 logiV 



&MlogN 
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Proof. Theorem J8jl) can be shown to hold with probability exceeding 1 - 2N~ l by using a bound on the norm of 
a Gaussian random vector in [34, Lemma 1] and a bound on the magnitude of the inner product of two indepen- 
dent Gaussian random vectors in ll26l Lemma 6]. Specifically, pick any two distinct indices i, j e {1, . . . ,N], and 
define probability events & x := {\{g h gj)\ < Si), & 2 := {\\ gi \\ 2 > M(l - <5 2 )}, and £ 3 := (ll^ll 2 > M(l - § 2 )} for 
5\ = -s/lSMlogN and 62 = -y/(121og N)/M. Then it follows from the union bound that 

Pr(l<A/;>l > sfe) = Pr(^fg > ^) < Pr(^) + M& 2 ) + Pr(^). 

One can verify that Pr(£!p = Pr(£ 3 ) ^ because of [34, Lemma 1], and we further ha ve Pr(£j) < 2N~ 3 because of 
Il26l Lemma 6] and the fact that M > 60 log N. Thus, for any fixed i and j, \{f u fj)\ < Vl51ogiV/(VM - yi21ogiV) 
with probability exceeding 1 - 4N~ 3 . It therefore follows by taking a union bound over all \Z\ choices for i and j that 
Theorem [8|i) holds with probability exceeding 1 - 2N~ l . 

Theorem|8jii) can be shown to hold with probability exceeding 1 - 6N~ l by appealing to the preceding analysis 
and Hoeffding's inequality for a sum of independent, bounded random variables 1 30 1 . Specifically, fix any index 
i e [1, . . . ,N), and define random variables Z l . := j^ifufj). Next, define the probability event 



^ - [ J \\ Z j\ * iV-T VM-Vl21o g ^ 



Using the analysis for the worst-case coherence of F and taking a union bound over the N-l possible fs gives 
Pr(£^) < 4N~ 2 . Furthermore, taking 63 := -J15 \ogN/(M - ^j\2M \ogN), then elementary probability analysis gives 



ft (|Z Z 5| > *)^ Pr (|Z Z 5| > * f 



Z' > (5 3 



£4,/; = ^)p // (^)dH M - 1 W+4^ z , (12) 



where S M denotes the unit hypersphere in Mr, H M 1 denotes the (M - l)-dimensional Hausdorff measure on S M ~ l , 
and pfiix) denotes the probability density function for the random vector f t . The first thing to note here is that the 
random variables \Z l . : j + i] are bounded and jointly independent when conditioned on £4 and fi. This assertion 
mainly follows from Bayes' rule and the fact that { fj : j + i] are jointly independent when conditioned on fi. The 
second thing to note is that E[Z'. | £4,//] = for every j + i. This comes from the fact that the random vectors 
{fn}% = i are independent and have a uniform distribution over S M ~ l , which in turn guarantees that the random variables 
{Z'j : j + i} have a symmetric distribution around zero when conditioned on £4 and ft. We can therefore make use of 
Hoeffding's inequality BUI to bound the probability expression inside the integral in ( fT2] i as 



Pr(|2><|>5 3 £ 4 ,/i=*)<2e« 2M , 



(13) 



which is bounded above by 2N 2 provided M 



N-l 



j k y . We can now substitute ( fT3| into ( fT~2] > and take the union bound 
over the N possible choices for i to conclude that Theorem[8]n) holds with probability exceeding 1 - 6N~ l . 

Lastly, Theorem^m) can be shown to hold with probability exceeding 1 - 3iV _1 by using a bound on the spectral 
norm of standard Gaussian random matrices reported in [41 1 along with Il34l Lemma 1]. Specifically, define anNxN 
diagonal matrix D := diag(||gi|| _I , . . . , HgArll )> and note that the entries of G := FD 1 1 are independently and normally 
distributed with zero mean and unit variance. We therefore have from (2.3) in [41 1 that 



Pr(||G|| 2 > Vm+ V^V + V21ogV) < 2AT 



(14) 



In addition, we can appeal to the preceding analysis for the probability bound on Theorem [HJi) and conclude using 
P4l Lemma 1] and a union bound over the N possible choices for i that 



Pr(||D|| 2 >{M- V8MlogJV) ' /2 ) < N 



(15) 
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Finally, since \\F\\2 < HGIbll^lb, we can take a union bound over ( fT4| ) and (JT3J to argue that Theorem|8jiii) holds with 
probability exceeding 1 - 3N~ l . 

The complete result now follows by taking a union bound over the failure probabilities for the conditions (i)-(iii) 
in Theorem|8] □ 

Example 9. To illustrate the bounds in Theorem[8j we ran simulations in MATLAB. Picking N = 50000, we observed 
30 realizations of normalized Gaussian frames for each M = 700,900, 1100. The distributions of pp, Vp, and ||,F|b 
were rather tight, so we only report the ranges of values attained, along with the bounds given in Theorem[8] 



M 


= 700 : 




e 


[0.1849,0.2072] 


< 0.8458 






v F 


e 


[0.5643,0.6613] x 10~ 3 


< 0.0320 






\Wh 


e 


[8.0521,8.0835] 


< 11.9565 


M 


= 900 : 




e 


[0.1946,0.2206] 


< 0.6848 






v F 


£ 


[0.5800,0.7501] x 10~ 3 


< 0.0229 






\W\\ 2 


s 


[8.4352,8.4617] 


< 10.3645 


M = 


= 1100 : 




s 


[0.1807,0.1988] 


< 0.5852 






v F 


£ 


[0.5260,0.6713] x 10~ 3 


< 0.0177 






\Wh 




[7.7262, 7.7492] 


< 9.2927 



These simulations seem to indicate that our bounds onpp and ||F||2 reflect real-world behavior, at least within an order 
of magnitude, whereas the bound on vp is rather loose. 

3.2. Random harmonic frames 

Random harmonic frames, constructed by randomly selecting rows of a discrete Fourier transform (DFT) matrix 
and normalizing the resulting columns, have received considerable attention lately in the compressed sensing literature 
lfT2l[T4"ll4"2"l . However, to the best of our knowledge, there is no result in the literature that shows that random harmonic 
frames have small worst-case coherence. To fill this gap, the following theorem characterizes the spectral norm and 
the worst-case and average coherence of random harmonic frames. 

Theorem 10 (Geometry of random harmonic frames). Let U be an N x N non-normalized discrete Fourier transform 
matrix, explicitly, Uu '■= e 2mke ^ N for each k, t = 0, . . . , N— 1. Next, let {Bi}._2 be a collection of independent Bernoulli 
random variables with mean ^jC, and take M := {i : Bj — 1). Finally, construct an \M\ X A^ harmonic frame F by 
collecting rows of U which correspond to indices in M and normalize the columns. Then F is a unit norm tight 
frame: \\F\\^ — jj^j. Furthermore, provided 16 log < M < j, the following inequalities simultaneously hold with 
probability exceeding 1 — 4N~ l — N~ 2 : 

(i) \M < \M\ < \M, 



(Hi) /dp < yj 



118(A'-M)logW 
MN 



Proof. The claim that F is tight follows trivially from the fact that the rows of U are orthogonal and that the 
rows of F correspond to a subset of the rows of U. Next, we define the probability events &\ := {\M\ < \M) and 
£ 2 := \\M\ > \M), and claim that Pr(£^ U £ c 2 ) < N' 1 + N~ 2 . The proof of this claim follows from a Bernstein-like 
large deviation inequality. Specifically, note that \M\ = E/Io' ^' w ^ E[|At|] = M, and so we have from [3, Theo- 
rem A. 1. 12, Theorem A. 1. 13] and 02] pp. 4] that for any 6 l e [0, 1), 

Pr(|At| > (1 + 6\)M) < q- M51 ^- s 0H and p r (|M| < (1 - 8{)M) < e" M5 ? /2 . (16) 

Taking 6\ :- |, then a union bound gives Pr(£<; U 8 C 2 ) < N~ l + N~ 2 provided M > 16 log AT. Conditioning on & x n £ 2 , 
we have that Theorem [lOji) holds trivially, while Theorem [T0|ii) follows from Lemma [7] Specifically, we have that 
j > M guarantees A^ > 2\Ai\ because of the conditioning on Si n &2, which in turn implies that F satisfies either 
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condition (i) or (ii) of Lemma [7] depending on whether e At. This therefore establishes that Theorem 10 i)-(ii) 
simultaneously hold with probability exceeding 1 - N~ l - N~ 2 . 

The only remaining claim is that px < 62 '■— ^(118(N - M)\ogN)/MN with high probability. To this end, define 
p := |r, and pick any two distinct indices i, j e {0, . . . ,N — 1}. Note that 

AM N-l 

where the last equality follows from the fact that U has orthogonal columns. Next, we write U^U^j — cos(^) + i sin(^) 
for some 0* e [0, 2n). Then applying the union bound to ( fTT) and to the real and imaginary parts of UkiUkj gives 

N-l 



Pr(K/«,/;>l > 62) < Pr (I J>* - p)U ki U kj \ > f§) + Pr(|At| < 

fc=0 

,N-l N-l 

<Pr(|£(**-,p)coii(fc)| > ^) + Pr(|2( B *-P)s in (^)| > ^) + AT 3 , (18) 



ifc=0 *=0 



where the last term follows from < [T6| > and the fact that M > 16 log N. Define random variables Z* := (fit - f) cos(0*)- 
Note that the Z^'s have zero mean and are jointly independent. Also, the Zt's are bounded by 1 — p almost surely since 
|(Z?t - p)cos(0i,)\ < max{p, 1 — p] and N > 2M. Moreover, the variance of each Z k is bounded: var(Z^) < p(l - p). 
Therefore, we may use the Bernstein inequality for a sum of independent, bounded random variables [8 1 to bound the 
probability that | 2£o z k\ deviates from 63 := 

Pr(| J](B k -p) C o S (e k )\ > £3) < 2e^/(2^d-/'H2(i-/')'5 3 /3) < 2 N- 3 . 

k=0 

Similarly, the probability that | zZ k =o(Bk - f>)sin(0jt)| > £3 is also bounded above by 2N~ 3 . Substituting these prob- 
ability bounds into ( fT8] l gives \{fi,fj)\ > 62 with probability at most 5Af~ 3 provided M > 16 log N. Finally, we take 
a union bound over the possible choices for i and j to get that Theorem [TOfiii) holds with probability exceeding 
1-3AT 1 . 

The result now follows by taking a final union bound over £j U £S and \px > £2)- □ 

As stated earlier, random harmonic frames are not new to sparse signal processing. Interestingly, for the appli- 
cation of compressed sensing, Ifl3l [42] provides performance guarantees for both random harmonic and Gaussian 
frames, but requires more rows in a random harmonic frame to accommodate the same level of sparsity. This sug- 
gests that random harmonic frames may be inferior to Gaussian frames as compressed sensing matrices, but practice 
suggests otherwise [22 J. In a sense, Theorem [10] helps to resolve this gap in understanding; there exist compressed 
sensing algorithms whose performance is dictated by worst-case coherence J5J |2T] |47l |50), and Theorem 10 states 



that random harmonic frames have near-optimal worst-case coherence, being on the order of the Welch bound with 
an additional ^\ogN factor. 



Example 11. To illustrate the bounds in Theorem 10 we ran simulations in MATLAB. Picking N = 5000, we observed 



30 realizations of random harmonic frames for each M = 1000, 1250, 1500. The distributions of \M\, vf, and pf were 



rather tight, so we only report the ranges of values attained, along with the bounds given in Theorem 10 Notice that 



Theorem[T0|gives a bound on vv in terms of both |At| and uf. To simplify matters, we show that vf < """^ < 

I 1° Vmax|A1| V|A1| 
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where the minimum and maximum are taken over all realizations in the sample: 



M — 


1 AAA . 

1000 : 


1 A A\ 

\M\ 


£ 


[961, 1052J 


f— rCAA 1 CAAl 

C pUU, IjUUJ 






Vf 


£ 


[0.2000,0.8082] x 10~ 3 


< 0.0023 * "-^z 46 

a/1052 








e 


[0.0746,0.0890] 


< 0.8967 


M = 


1250 : 


\M\ 


e 


[1207, 1305] 


c [625, 1875] 






vf 


£ 


[0.2000,0.6273] x 10~ 3 


< 0.0018 ~ 

V1305 










[0.0623,0.0774] 


< 0.7766 


M = 


1500 : 


\M\ 


£ 


[1454, 1590] 


c [750,2250] 






Vf 


£ 


[0.2000,0.4841] x 10~ 3 


< 0.0015 * BgZ 








£ 


[0.0571,0.0743] 


< 0.6849 



The reader may have noticed how consistently the average coherence value of v F as 0.2000 x 10 3 was realized. This 
occurs precisely when the zeroth row of the DFT is not selected, as the frame elements sum to zero in this case: 



v F := t^t max 



7=1 



7=1 



1 

N-l ■ 



These simulations seem to indicate that our bounds on [Ml, Vp, and jUf leave room for improvement. The only bound 
that lies within an order of magnitude of real-world behavior is our bound on \M\. 

3.3. Gabor and chirp frames 

Gabor frames constitute an important class of frames, as they appear in a variety of applications such as radar 
E9l . speech processing [53 1, and quantum information theory [43 1. Given a nonzero seed function / : 1m — > C, we 
produce all time- and frequency-shifted versions: f xy {t) := fit— x)e 2 * dyt ' M , t £ 1m- Viewing these shifted functions as 
vectors in C M gives an M x M 2 Gabor frame. The following theorem characterizes the spectral norm and the worst- 
case and average coherence of Gabor frames generated from either a deterministic Alltop vector |Q] or a random 
Steinhaus vector. 

Theorem 12 (Geometry of Gabor frames). Take an Alltop function defined by /(f) := -^=e 2m '^ M , t e Zm- Also, take 
a random Steinhaus function defined by g(t) :— -^e 2m8 ', t e %m, where the 6,'s are independent random variables 
distributed uniformly on the unit interval. Then the M X M 2 Gabor frames F and G generated by f and g, respectively, 
are unit norm and tight, that is, \\FW2 — HGIb = yM, and both frames have average coherence < r^r. Furthermore, 
if M > 5 is prime, then [i F - h=, while if M > 13, then fi c < -\J(13 log M)/M with probability exceeding 1 - 4M~'. 

Proof. The tightness claim follows from ll35ll . in which it was shown that Gabor frames generated by nonzero seed 
vectors are tight. The bound on average coherence is a consequence of [5 , Theorem 7] concerning arbitrary Gabor 
frames. The claim concerning fif follows directly from l46ll . while the claim concerning fie is a simple consequence 
of ESI Theorem 5.1]. □ 

Instead of taking all translates and modulates of a seed function, [ 16] constructs chirp frames by taking all powers 
and modulates of a chirp function. Picking M to be prime, we start with a chirp function hu ■ 1m —* C defined 
by h M (t) := e ™«-M)/M^ t £ The M 2 frame e i ements are then defined entrywise by h ab (t) := -^h M (t) a e 2nil,,/M ', 
t e %m- Certainly, chirp frames are, at the very least, similar in spirit to Gabor frames. As a matter of fact, the 
chirp frame is in some sense equivalent to the Gabor frame generated by the Alltop function: it is easy to verify 
that /i(-6.r,y-3.r 2 )( f ) = fxyQ), and when M > 5, the map (x,y) h-> (—6x,y - 3x 2 ) is a permutation over 1? M . 

Using terminology from Definition[28] we say the chirp frame is wiggling equivalent to a unitary rotation of permuted 



Alltop Gabor frame elements. As such, by Lemma 29 the chirp frame has the same spectral norm and worst-case 



ll 



coherence as the Alltop Gabor frame, but the average coherence may be different. In this case, the average coherence 
still satisfies (SCP-2). Indeed, adding the frame elements gives 



M-l M-l 
a=0 h=0 



M-l 



M-l 



M-l 



, M-l 



i 

Vm 



2 / lM (f) fl J] e 



ZmbtjU 



a=0 



b=0 



_J_ 

Vm 



huitfMSoit) = Vm( MO) j $>(*) = M 3/2 c5 (f), 

fl=0 17=0 



and so (A^, X^o* S^o' /l «6> = {ha>h>,M 3,2 6 ) = M y2 h a , b ,(0) = M=^. Therefore, Lemma |7|;i) gives the result: 

Theorem 13 (Geometry of chirp frames). Pick M prime, and let H be the M x M 2 frame of all powers and modulates 

of the chirp function fa. Then H is a unit norm tight frame with \\H\\2 — vM, and has worst case coherence /j.h = -j= 

m 
Vm' 



and average coherence Vh < -7=- 



Example 14. To illustrate the bounds in Theorems 12 and 13 we consider the examples of an Alltop Gabor frame and 

a chirp frame, each with M = 5. In this case, the Gabor frame has Vp ~ 0.1348 < 0.1667 » -gz\, while the chirp frame 

hasv H =±<± = -^ 
" 6 5 Vm 



Note the Gabor and chirp frames have different average coherences despite being equivalent 
in some sense. For the random Steinhaus Gabor frame, we ran simulations in MATLAB and observed 30 realizations 
for each M = 60, 70, 80. The distributions of v c and p c were rather tight, so we only report the ranges of values 



attained, along with the bounds given in Theorem 12 



M 


= 60 : 


vg 


G 


[0.3916,0.5958] x 10~ 2 


< 0.0164 






fk; 


e 


[0.3242,0.4216] 


< 0.9419 


M 


= 70 : 


vg 


e 


[0.3151,0.4532] x 10~ 2 


< 0.0141 






Hg 


e 


[0.2989,0.3814] 


< 0.8883 


M 


= 80 : 


vg 


G 


[0.2413,0.3758] x 10~ 2 


< 0.0124 








G 


[0.2711,0.3796] 


< 0.8439 



These simulations seem to indicate that bound on v c is conservative by an order of magnitude. 
3.4. Spherical 2-designs 

Lemma[7|ii) leads one to consider frames of vectors that sum to zero. In OH . it is proved that real unit norm tight 
frames with this property make up another well-studied class of vector packings: spherical 2-designs. To be clear, a 
collection of unit-norm vectors F c R M is called a spherical f-design if, for every polynomial g{x\, . . . , Xm) of degree 
at most t, we have 

) S M- 



g(x)dH M - l (x) 



where S M ~ l is the unit hypersphere in R M and H M 1 denotes the (M - l)-dimensional Hausdorff measure on S M ~ l . 
In words, vectors that form a spherical f-design serve as good representatives when calculating the average value of a 
degree-f polynomial over the unit hypersphere. Today, such designs find application in quantum state estimation [28 1. 

Since real unit norm tight frames always exist for N > M + 1, one might suspect that spherical 2-designs are 
equally common, but this intuition is faulty — the sum-to-zero condition introduces certain issues. For example, there 
is no spherical 2-design when M is odd and N - M + 2. In 113611 . spherical 2-designs are explicitly characterized by 
construction. The following theorem gives a construction based on harmonic frames: 

Theorem 15 (Geometry of spherical 2-designs). Pick M even and N > 2M. Take an y x harmonic frame G by 
collecting rows from a discrete Fourier transform matrix according to a set of nonzero indices M and normalize the 
columns. Let m(n) denote nth largest index in M, and define a real M X N frame F by 



kt 



^cos( 2 ™ (( ™ ), kodd 



2 Sin( 2™|/2X X 



, M, I = 0, 



,N-L 



k even 



Then F is unit norm and tight, i.e., \\F\^ — ~, with worst-case coherence p.f < fie and average coherence Vp < -^==. 



12 



Proof. It is easy to verify that F is a unit norm tight frame using the geometric sum formula. Also, since the frame 
elements sum to zero and N > 2M, the claim regarding average coherence follows from Lemma [TJii). It remains to 
prove jjf < fic- For eacn distinct pair of indices i, j £ {!,..., AO, we have 



2 (cos(2fi)cos(^i) + sin(2fH)sin( 



meM 



(fufj) 

and so \(f u fj)\ = \Re{gi,gj)\ < \(gi,gj)\. This gives the result. 



N ') 



E 

meM 



cos( 



2jim{i-j) 
N 



□ 



Example 16. To illustrate the bounds in Theorem 15 we consider the spherical 2-design constructed from a 9 x 37 
harmonic equiangular tight frame [54 1. Specifically, we take a 37 x 37 DFT matrix, choose nonzero row indices 



M = {1,7,9,10,12,16,26,33,34}, 
and nor malize the columns to get a harmonic frame G whose worst-case coherence achieves the Welch bound: 

,, _ / 37-9 



V F 



, h ,- .. , ~ 0.2940. Following Theorem [15] we produce a spherical 2-design F with p F 
0.'0278 < 0.0464 * 



0.1967 < fi G and 



3.5. Steiner equiangular tight frames 

We now consider a construction that dates back to Seidel with [44], and was recently developed further in 
Here, a special type of block design is used to build an equiangular tight frame (ETF), that is, a tight frame in which 
the modulus of every inner product between frame elements achieves the Welch bound. Let's start with a definition: 

Definition 17. A (t, k, v)-Steiner system is a v-element set V with a collection of ^-element subsets of V, called blocks, 
with the property that any f-element subset of V is contained in exactly one block. The {0, \}-incidence matrix A of a 
Steiner system has entries An, where A,j = 1 if the 2th block contains the jth element, and otherwise A, ; = 0. 

One example of a Steiner system is a set with all possible two-element blocks. This forms a (2, 2, v)-Steiner system 
because every pair of elements is contained in exactly one block. The following theorem details how [24] constructs 
ETFs using Steiner systems. 

Theorem 18 (Constructing Steiner equiangular tight frames [24]). Every (2,1c, v)-Steiner system can be used to build 



a WPtf X v(l + jpj) equiangular tight frame F according the following procedure: 

( i) Let A be the x v incidence matrix of a (2, k, v)-Steiner system. 

(ii) Let H be the (1 + jp-r) X (1 + t^t) discrete Fourier transform matrix. 



(Hi) For each j — 1, . . . , v, let Fj be a x (1 + |— |-) matrix obtained from the jth column of A by replacing each 

of the one-valued entries with a distinct row of H, and every zero-valued entry with a row of zeros, 
(iv) Concatenate and rescale the Fj's to form F = (£ry) 5 [F\ ■ ■ ■ F v ]. 

As an example, we build an ETF from a (2,2,3)-Steiner system. In this case, the incidence matrix is 

+ + 
+ ■+ 



For this matrix, each row represents a block. Since each block contains two elements, each row of the matrix has 
two ones. Also, any two elements determines a unique common row, and so any two columns have a single one in 
common. To form the corresponding 3x9 ETF F, we use the 3x3 DFT matrix. Letting a> = e 2m ^, we have 





1 


1 


1 


H = 


1 


OJ 


to 




1 


CO 2 


CO 



13 



Name R 


'C Size 




fF 




Restrictions 


Probability 


Normalized Gaussian E 


Mx N 




< — V 1510 ^ _ 




601ogiV<M< ^ 






Vm- ymogTv 


~~ M-yi2MlogJV 


Random harmonic C 
Alltop Gabor C 


\M\ X N, i 
MxM 2 


M <\M\< \M 


^ /ll8(JV-M)log# 

- \ mm 

_ _i_ 
Vm 


< t£= 

- VjMI 

- M+T 


161og/V < M < f 
M > 5 prime 


Deterministic 


Steinhaus Gabor C 


MxM 2 




, /1310-M 
S V M 


- M+T 


M > 13 


M 


Chirp C 


MxM 2 




~ Vm 


< 

Vm 


M prime 


Deterministic 


Spherical 2-design 
from harmonic G M. 


MxN 




< /Jc 


< £L 
Vm 


M even, N >2M 


Deterministic 


Steiner C 


MxN,M 




/ W-M 


< ^ 
Vm 


3(2, k, v)-Steiner system 


Deterministic 


Code-based R 


2"' x 2 (,+1) ' 




< ' 


< -% 

- V2™ 


None 


Deterministic 



Table 1 : Eight constructions detailed in this paper. All of these are unit norm tight frames except for the normalized Gaussian frame, which has 
squared spectral norm < (VM + Vw + V21ogA0 2 /(M - V8MlogAT) in the same probability event as is measured above. 



Finally, we replace the two ones in each column of A with the second and third rows of H. Normalizing the columns 
gives 3x9 ETF: 

la> a> 2 1 u> ay 2 

1 u 2 u 1 u to 2 . (19) 

1 a> 2 a) 1 of 2 a) 



V2 



Several infinite families of (2, k, v)-Steiner systems are already known, and Theorem[T8]says that each one can be 
used to build an ETF. See |24| for a complete discussion of this construction and how it relates to each known family 



of Steiner systems. Interestingly, every Steiner ETF satisfies N > 2M. If, in step (iii) of Theorem 18 we choose 



the distinct rows to be the rows of the DFT H that are not all-ones, then the sum of columns of each Fj is zero, 
meaning the sum of columns of F is also zero. This was done in the example above, and the columns sum to zero, 
accordingly. Therefore, by Lemma|7|ii), Steiner ETFs satisfy (SCP-2). This gives the following theorem: 

Theorem 19 (Geometry of Steiner equiangular tight frames). Build an MxN matrix F according to Theorem^E^ and 
in step (iii), choose rows from the discrete Fourier transform matrix H that are not all-ones. Then F is an equiangular 
tight frame, meaning \\F\^ — || and p 2 F — ^^ M X y and has average coherence v F < -^=. 

Example 20. To illustrate the bound in Theorem 19 we note that the example given in ( fT9| ) has Vp = | < ^= = 
3.6. Code-based frames 

Many structures in coding theory are also useful in frame theory. In this section, we build frames from a code that 
originally emerged with Berlekamp in [9], and found recent reincarnation with l55l . We build a 2 m x 2 (,+I)m frame, 
indexing rows by elements of F2* and indexing columns by (f + l)-tuples of elements from F2™. For x e Fy and 
a £ F^t, , the corresponding entry of the matrix F is given by 

F m = ^(-l) Tr [^ + S=i^ i+I l, (20) 

where Tr : Fy* — > F2 denotes the trace map, defined by Tr(z) = Xl/lo' £ 2 ■ The following theorem gives the spectral 
norm and the worst-case and average coherence of this frame. 

Theorem 21 (Geometry of code-based frames). The 2™ x 2 ( ' +1)m frame defined by |20} is unit norm and tight, i.e., 
\\F\\ 2 — 2"", with worst-case coherence p F < y=g^j and average coherence Vf < 

Proof. For the tightness claim, we use the linearity of the trace map to write the inner product of rows x and y: 

^ J_( l) Tl i a « x+ ZL, <*,* J + '] j_( l)Tr[goy+£L,«,v 2/+1 ] _ ^ (^Trfaj&c+y)] j ^ ... ^ (-l) Tr [ S=i +1 +y 2 ' +1 )] _ 

aeF'i a e¥ 2 ™ »ieF F a,eW z m 
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This expression is 2"" when x — y. Otherwise, note that ao i-» (-l) Tr [ Q '»(- l+ 3')] <= j+ij defines a homomorphism on F2*. 
Since (x + y)~ l i-> -1, the inverse images of ±1 under this homomorphism must form two cosets of equal size, and so 
2a eF 2 '"( _ ^ Tl[ff0<t+ " V)] = 0, meaning distinct rows in F are orthogonal. Thus, F is a unit norm tight frame. 
For the worst-case coherence claim, we first note that the linearity of the trace map gives 

, ]\Tr[a ;c+2! = i ff ' Jr2 +1 ] (_ l) Tr [ a o x+ 2! = i _ (_^Tr[(» +a'o)-r+Ej =1 (Q',+ff;).t 2 ' +1 ] 

i.e., every inner product between columns of F is a sum over another column. Thus, there exists a e FS, 1 such that 



2 2 "74 = ( J] ( _l)Tr[ao^ELi^ i+1 ]) 2 = 2" 1 + J] J] (-l) Tf [ 



ao(**y)+ZLi »/((-»+>') 2/+1 +E£,(^) 2J ' (*+>■)- 



where the last equality is by the identity (jc + y) 2 +1 = x 2 ' +l + y 2 +I + Zj^oOoO 2 ^ + y) 2 2 + +1 > whose proof is a 
simple exercise of induction. From here, we perform a change of variables: u := x + y and v := xy. Notice that 
(u, v) corresponds to (x,y) for some x + y whenever (z + x)(z + y) = z 2 + uz + v has two solutions, that is, whenever 
Tr(^) = 0. Since (u, v) corresponds to both (x,y) and (y,x), we must correct for under-counting: 



2 2m p? F - 2"' + 2 ^ ^ (_i) Tr [ u 'o»+Z| = i «,(» 2 ' +1 +Ey = ' i' 2J " 2 ' 



')] 



ueF 2 m veF 2 m 
"#0 Tr(v/« 2 )=0 



= 2"' + 2 ^ (_i) Tl "[°'o" + 2| = i Q--" 2 '* 1 ] ^ (_i) Tr [(S=i 2',ia 2 J « 2 ' J 2+2 1 



HeF 2 m veF 2 m 
"#0 Tr(v/M 2 )=0 



<2'"+2^ X 



Tr[p(«)v] 



ueF 2 m veF 2 »? 
K#0 Tr(v/« 2 )=0 



(21) 



where the second equality is by repeated application of Tr(z) = Tr(z 2 ), and p(u) := 5j>0 " 2 2+2 ^° bound 
p F , we will count the m's that produce nonzero summands in pTj ). 

For each m ^ 0, we have a homomorphism^-,, : {v e F2™ : Tr(-^) = 0} — > {+1} defined by ^„(v) := (- l) Tl W M M _ 
Pick m # for which there exists a v such that both Tr(-^) = and Tr[p(u)v] = 1. Then^„(v) = -1, and so the 
kernel of^„ is the same size as the coset {v e Fa» : Tr(^) = 0,Xu(v) = — 1}, meaning the summand associated with u 
in ( f2T| is zero. Hence, the nonzero summands in pTj ) require Tr(^) = and Tr[p(«)v] = 0. This is certainly possible 
whenever p(u) = 0. Exponentiation gives 

t i-i 
i=l 7=0 

which has degree 2 2f_1 — 2 . Thus, p(u) = has at most 2 2 '~' - 2 i_I solutions, and each such u produces a summand 
in pT) of size 2 m_1 . Next, we consider the u's for which Tr(^) = 0, Tr[p(u)v] = 0, and p(u) + 0. In this case, the 
hyperplanes defined by Tr(-^) = and Ti[p(u)v] — are parallel, and so p(u) = 4. Here, 

i=l j=0 

which has degree 2 2 '~' +2' _1 . Thus, p(u) = -7 has at most 2 2 '~' + 2'~' solutions, and each such u produces a summand 
in((2T}ofsize2 ffl - 1 . We can now continue the bound from ((2TJ: 2 2 > 2 < 2"'+2(2 2 '- 1 -2 f - 1 +2 2 '- 1 +2'- 1 )2 ffl - 1 < 2" ,+2 ' +1 . 
From here, isolating /if gives the claim. 

Lastly, for the average coherence, pick some x e F^- . Then summing the entries in the xth row gives 



ffEF't,, 1 o- eF 2 ». aieF 2 ». a,eF 2 , 



2 (f+l/2)m ) ^ = 
X + 
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That is, the frame elements sum to a multiple of an identity basis element: YiaeF' + ' fa - 2 (,+1 ^ 2)m 5o. Since every entry 
in row x — is we have {f a >, Saep+J fa) - ^sr" for every a' e F^, 1 , and so by Lemma[7jl), we are done. □ 



Example 22. To illustrate the bounds in Theorem 21 we consider the example where m = 4 and t — 1. This is a 
16 x 256 code-based frame F with Uf = i < -4= = , , , and vr = A < | = -4^=. 

4. Fundamental limits on worst-case coherence 

In many applications of frames, performance is dictated by worst-case coherence 01 ITT1 [2T1 [3T1 r37l 14^71 [501 . 
It is therefore particularly important to understand which worst-case coherence values are achievable. To this end, the 
Welch bound is commonly used in the literature. When worst-case coherence achieves the Welch bound, the frame is 
equiangular and tight [46 1; one of the biggest open problems in frame theory concerns equiangular tight frames [43 1. 
However, equiangular tight frames cannot have more vectors than the square of the spatial dimension |46|, meaning 
the Welch bound is not tight whenever N > M . When the number of vectors N is exceedingly large, the following 
theorem gives a better bound: 

Theorem 23 ([2, 39 1). Every sufficiently large MxN unit norm frame F with N > 2M and worst-case coherence 

4log(^)>^T (22) 



fip < 5 satisfies 



for some constant C > 0. 

For a fixed worst-case coherence p F < |, this bound indicates that the number of vectors N cannot exceed some 
exponential in the spatial dimension M, that is, N < a M for some a > 0. However, since the constant C is not 
established in this theorem, it is unclear which base a is appropriate for each pf. The following theorem is a little 
more explicit in this regard: 

Theorem 24 ([38, 54 1). Every MxN unit norm frame F has worst-case coherence p F ^ l — 2N'^ M '. Furthermore, 
taking N = &(a M ), this lower bound goes to 1 — - as M — > oo. 



For many applications, it does not make sense to use a complex frame, but the bound in Theorem 24 is known to 



be loose for real frames ITS) . We therefore improve Theorems 23 and 24 for the case of real unit norm frames 



Theorem 25. Every real MxN unit norm frame F has worst-case coherence 



jlf > cos 



,Nn'l 2 r( «) ; 



(23) 



Furthermore, taking N — &(a M ), this lower bound goes to cos(^) as M — > oo. 

Before proving this theorem, we first consider the special case where the spatial dimension is M — 3: 

Lemma 26. Given N points on the unit sphere S 2 C R 3 , the smallest angle between points is < 2 cos -1 (1 — ^). 

Proof. We first claim there exists a closed spherical cap in S 2 with area 4| that contains two of the points. Suppose 
otherwise, and take y to be the angular radius of a spherical cap with area 4^ That is, y is the angle between the 
center of the cap and every point on the boundary. Since the cap is closed, we must have that the smallest angle a 
between any two of our Af points satisfies a > 2y. Let C(p, 9) denote the closed spherical cap centered at p e S 2 of 
angular radius 6, and let P denote our set of Af points. Then we know for p e P, the C(p, y)'s are disjoint, | > y, and 
UpeP C(p, f ) £ S 2 , and so taking 2-dimensional Hausdorff measures on the sphere gives 

H 2 (S 2 ) = 4n = H 2 ( |J C(p, y)) < H 2 ( (J C(p, f )) < H 2 (S 2 ), 

peP peP 

a contradiction. 

Since two of the points reside in a spherical cap of area 4^, we know a is no more than twice the radius of this 
cap. We use spherical coordinates to relate the cap's area to the radius: H (C(-,y)) = 2n J Q sirup dip — 2n{\ - cosy). 
Therefore, when H 2 (C(-,y)) = 4^, we have y = cos~'(l - ^), and so a < 2y gives the result. □ 
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n F 1 




5 10 15 20 25 30 35 40 45 50 55 N 

Figure 1 : Different bounds on worst-case coherence for M = 3, N = 3, . . . , 55. Stars give numerically determined optimal worst-case coherence of 
N real unit vectors, found in 1181 . Dotted curve gives Welch bound, dash-dotted curve gives bound from Theorem |24| dashed curve gives bound 
from Theorem |25| and solid curve gives bound from Theorem |27| 



Theorem 27. Every real 3 x N unit norm frame F has worst-case coherence /j.f > 1 — 4 + 



Proof. Packing N unit vectors in M? corresponds to packing 2N antipodal points in S 2 , and so Lemma 26 gives 
a < 2 cos~'(l -4). Applying the double angle formula to fif = cos a > cos[2cos _1 (l — 4)] gives the result. □ 

Now that we understand the special case where M = 3, we tackle the general case: 

Proof of Theorem^25^ As in the proof of Theorem[27] we relate packing N unit vectors to packing 2N points in the 
hypersphere S M c M. M . The argument in the proof of Lemma 26 generalizes so that two of the 2N points must reside 
in some closed hyperspherical cap of hypersurface area j^H M ~^JS M ~ l ). Therefore, the smallest angle a between these 
points is no more than twice the radius of this cap. Let C{y) denote a hyperspherical cap of angular radius y. Then we 
use hyperspherical coordinates to get 

H M -'(C(r))= f •• f f sin^-^O-.-sinH^^d^-f-d^i 

= ^H?£™ M - 2 ^#- (24) 

We wish to solve for y, but analytically inverting sin M ~ 2 <p d(f> is difficult. Instead, we use sin > ^ for (f> e [0, |]. 
Note that we do not lose generality by forcing y < |, since this is guaranteed with N > 2. Continuing |24} gives 



H^UCM) > 2 * iM ~ m r (&) M - 2 dd> = (25) 

Using the formula for a hypersphere's hypersurface area, we can express the left-hand side of 



Isolating 2y above and using a < 2y and /i = cos a gives ( |23| l. The second part of the result comes from a simple 
application of Stirling's approximation. □ 
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In lfT8l . numerical results are given for M — 3, and we compare these results to Theorems 24 and 25 in Figure [T] 
Considering this figure, we note that the bound in Theorem 24 is inferior to the maximum of the Welch bound and 
the bound in Theorem[25] at least when M — 3. This illustrates the degree to which Theorem [25] improves the bound 
in Theorem 24 for real frames. In fact, since cos(-) > 1 — - for all a > 2, the bound for real frames in Theorem 25 
is asymptotically better than the bound for complex frames in Theorem 24 Moreover, for M = 2, Theorem 25 says 



27 



fi > cos(^), and [7| proved this bound to be tight for every N > 2. Lastly, Figure [T] illustrates that Theorem 
improves the bound in Theorem 25 for the case M — 3. 

In many applications, large dictionaries are built to obtain sparse reconstruction, but the known guarantees on 



For example, if 
Such a 
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sparse reconstruction place certain requirements on worst-case coherence. Asymptotically, the bounds in Theorems 24 
and 25 indicate that certain exponentially large dictionaries will not satisfy these requirements. 
N = 0(3 M ), then [if = £2(5) by Theorem 24 and if the frame is real, we have fif - f2( g) by Theorem 
dictionary will only work for sparse reconstruction if the sparsity level K is sufficiently small; deterministic guarantees 
require K < pf} ET1I481 . while probabilistic guarantees require K < p F 2 [5 , 49], and so in this example, the dictionary 
can, at best, only accommodate sparsity levels that are smaller than 10. Unfortunately, in real- world applications, we 

This in mind, Theorems 24 and 25 tell us that 
0((2 + e) M ) for some sufficiently small e > 0. To 
and Theorems 



24 



and 



25 



give bounds which are 



can expect the sparsity level to scale with the signal dimension, 
dictionaries can only be used for sparse reconstruction if N 
summarize, the Welch bound is known to be tight only if N < M 2 , 
asympotically better than the Welch bound whenever N = Q(2 M ). When N is between M 2 and 2 M , the best bound to 
date is the (loose) Welch bound, and so more work needs to be done to bound worst-case coherence in this parameter 
region. 



5. Reducing average coherence 

In 0, average coherence is used to derive a number of guarantees on sparse signal processing. Since average 
coherence is so new to the frame theory literature, this section will investigate how average coherence relates to 
worst-case coherence and the spectral norm. We start with a definition: 

Definition 28 (Wiggling and flipping equivalent frames). We say the frames F and G are wiggling equivalent if there 
exists a diagonal matrix D of unimodular entries such that G = FD. Furthermore, they are flipping equivalent if D is 
real, having only ±l's on the diagonal. 

The terms "wiggling" and "flipping" are inspired by the fact that individual frame elements of such equivalent 
frames are related by simple unitary operations. Note that every frame with N nonzero frame elements belongs to a 
flipping equivalence class of size 2 N , while being wiggling equivalent to uncountably many frames. The importance 
of this type of frame equivalence is, in part, due to the following lemma, which characterizes the shared geometry of 
wiggling equivalent frames: 

Lemma 29 (Geometry of wiggling equivalent frames). Wiggling equivalence preserves the norms of frame elements, 
the worst-case coherence, and the spectral norm. 

Proof. Take two frames F and G such that G = FD. The first claim is immediate. Next, the Gram matrices are 
related by G*G = D*F*FD. Since corresponding off-diagonal entries are equal in modulus, we know the worst-case 
coherences are equal. Finally, \\G\\ 2 = IIGG'llj = \\FDD*F*\\ 2 = ||FF*|| 2 = \\F\\^, and so we are done. □ 

Wiggling and flipping equivalence are not entirely new to frame theory. For a real equiangular tight frame F, 
the Gram matrix F*F is completely determined by the sign pattern of the off-diagonal entries, which can in turn be 
interpreted as the Seidel adjacency matrix of a graph Gf- As such, flipping a frame element / e F has the effect of 
negating the corresponding row and column in the Gram matrix, which further corresponds to switching the adjacency 
rule for that vertex vy e V(Gf ) in the graph — vertices are adjacent to vt after switching precisely when they were 
not adjacent before switching. Graphs are called switching equivalent if there is a sequence of switching operations 
that produces one graph from the other; this equivalence was introduced in BTI and was later extensively studied by 
Seidel in B4ll45l . Since flipping equivalent real equiangular tight frames correspond to switching equivalent graphs, 
the terms have become interchangeable. For example, [ 15 1 uses switching (i.e., wiggling and flipping) equivalence to 
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Algorithm 2 Linear-time flipping 



Input: An M x N unit norm frame F 

Output: An M xN unit norm frame G that is flipping equivalent to F 

gx <— f\ {Keep first frame element) 

for n = 2 to N do 

if II Z?=i X ^ + < II Z?=i ^ - then 

gn fn {Keep frame element to make sum length shorter) 

else 

g n < f n {Flip frame element to make sum length shorter) 

end if 
end for 



make progress on an important problem in frame theory called the Paulsen problem, which asks how close a nearly 
unit norm, nearly tight frame must be to a unit norm tight frame. 

Now that we understand wiggling and flipping equivalence, we are ready for the main idea behind this section. 
Suppose we are given a unit norm frame with acceptable spectral norm and worst-case coherence, but we also want 



the average coherence to satisfy (SCP-2). Then by Lemma 29 all of the wiggling equivalent frames will also have 



acceptable spectral norm and worst-case coherence, and so it is reasonable to check these frames for good average 
coherence. In fact, the following theorem guarantees that at least one of the flipping equivalent frames will have good 
average coherence, with only modest requirements on the original frame's redundancy. 

Theorem 30 (Constructing frames with low average coherence). Let F be an MxN unit norm frame with M < 4l N og \ N - 
Then there exists a frame G that is flipping equivalent to F and satisfies vq < •^=. 

Proof. Take {R n } N = x t0 be a Rademacher sequence that independently takes values +1, each with probability i. We 
use this sequence to randomly flip F; define Z := F diag{/?„)^ =1 . Note that if Pr(vz < -y=) > 0, we are done. Fix some 
!'£{!,... ,N}. Then 



Pr 



i 

AM 



N 



;=i 



> 



±F_ 

VM 



Pr 



A'' 



J]Rj(fi>fj> 



> 



(N-DflF 

VM 



(26) 



We can view YjjH Rjifu fj) as a sum of iV — 1 independent zero-mean complex random variables that are bounded 
by pf. We can therefore use a complex version of Hoeffding's inequality [30 1 (see, e.g., (4, Lemma 3.8]) to bound 
the probability expression in ( po*) as < 4e~ (A '~ 1) / 4M . From here, a union bound over all N choices for i gives Pr(vz < 
^) > 1 - 4Ne- (N - r)/4M ', and so M < #4« implies Pr(v z < > 0, as desired. □ 

VAT ' 41og4AT r v z - jjjjj 



While Theorem 30 guarantees the existence of a flipping equivalent frame with good average coherence, the result 
does not describe how to find it. Certainly, one could check all 2 N frames in the flipping equivalence class, but such 
a procedure is computationally slow. As an alternative, we propose a linear-time flipping algorithm (Algorithm [21. 
The following theorem guarantees that linear-time flipping will produce a frame with good average coherence, but it 



requires the original frame's redundancy to be higher than what suffices in Theorem 30 



Theorem 31. Suppose N > M 2 + 3M + 3. Then Algorithm^outputs an M x N frame G that is flipping equivalent to 
F and satisfies vq < 

Proof. Considering Lemma [7 iii), it suffices to have || Yin=i 8n\\ 2 - N. We will use induction to show || 2* =] g„\\ 2 < k 
for k = 1, . N. Clearly, || £J =1 g n \\ 2 = ||/„|| 2 
Algorithm!] we know that 11 ^»=i *» + ^ +l1 ' 2 



1 < 1 

^ II lX=l Sn ~ gk+1 



Now assume || 2«=i ^nll z ^ k. Then by our choice for g^+x in 
2 . Expanding both sides of this inequality gives 



£ *„ +2RdY j g„,g k+ x\ + \\g k+ x\\ 2 < Yjg„ -2Re(Y jgn ,g k+ x) + \\g, 



k 
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and so Re<£* =1 g„,gk+\) < 0. Therefore, 



k+l 2 * 2 * * 



+ llft + ilf <£+l, 



rt=l n=l m = 1 m = 1 

where the last inequality uses the inductive hypothesis. □ 
Example 32. As an example of how linear-time flipping reduces average coherence, consider the following matrix: 



F :-- 



1 

V5 



+ + + + - + + + + - 

+ - + + + -- - + - 

+ + + + + + + + - + 

--- + - + + -- - 

- + + -- + -- -- 



Here, v F ~ 0.3778 > 0.2683 « -^L. Even though /V < M 2 + 3M + 3, we run linear-time flipping to get the flipping 



pattern D :- diag(-i 1 h + - ++). Then FD has average coherence Vfo ~ 0.1556 < = This example 

illustrates that the condition N > M 2 + 3M + 3 in Theorem 3 1 is sufficient but not necessary. 
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